{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "from urllib.request import urlopen\n",
    "from bs4 import BeautifulSoup\n",
    "html = urlopen('http://www.pythonscraping.com/pages/warandpeace.html')\n",
    "bs = BeautifulSoup(html, 'html.parser')\n",
    "print(bs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from urllib.request import urlopen\n",
    "from bs4 import BeautifulSoup\n",
    "html = urlopen('http://www.pythonscraping.com/pages/warandpeace.html')\n",
    "bs = BeautifulSoup(html, \"html.parser\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Anna\n",
      "Pavlovna Scherer\n",
      "Empress Marya\n",
      "Fedorovna\n",
      "Prince Vasili Kuragin\n",
      "Anna Pavlovna\n",
      "St. Petersburg\n",
      "the prince\n",
      "Anna Pavlovna\n",
      "Anna Pavlovna\n",
      "the prince\n",
      "the prince\n",
      "the prince\n",
      "Prince Vasili\n",
      "Anna Pavlovna\n",
      "Anna Pavlovna\n",
      "the prince\n",
      "Wintzingerode\n",
      "King of Prussia\n",
      "le Vicomte de Mortemart\n",
      "Montmorencys\n",
      "Rohans\n",
      "Abbe Morio\n",
      "the Emperor\n",
      "the prince\n",
      "Prince Vasili\n",
      "Dowager Empress Marya Fedorovna\n",
      "the baron\n",
      "Anna Pavlovna\n",
      "the Empress\n",
      "the Empress\n",
      "Anna Pavlovna's\n",
      "Her Majesty\n",
      "Baron\n",
      "Funke\n",
      "The prince\n",
      "Anna\n",
      "Pavlovna\n",
      "the Empress\n",
      "The prince\n",
      "Anatole\n",
      "the prince\n",
      "The prince\n",
      "Anna\n",
      "Pavlovna\n",
      "Anna Pavlovna\n"
     ]
    }
   ],
   "source": [
    "nameList = bs.findAll('span', {'class': 'green'})\n",
    "for name in nameList:\n",
    "    print(name.get_text())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[<h1>War and Peace</h1>, <h2>Chapter 1</h2>]\n"
     ]
    }
   ],
   "source": [
    "titles = bs.find_all(['h1', 'h2','h3','h4','h5','h6'])\n",
    "print([title for title in titles])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[<span class=\"red\">Well, Prince, so Genoa and Lucca are now just family estates of the\n",
      "Buonapartes. But I warn you, if you don't tell me that this means war,\n",
      "if you still try to defend the infamies and horrors perpetrated by\n",
      "that Antichrist- I really believe he is Antichrist- I will have\n",
      "nothing more to do with you and you are no longer my friend, no longer\n",
      "my 'faithful slave,' as you call yourself! But how do you do? I see\n",
      "I have frightened you- sit down and tell me all the news.</span>, <span class=\"green\">Anna\n",
      "Pavlovna Scherer</span>, <span class=\"green\">Empress Marya\n",
      "Fedorovna</span>, <span class=\"green\">Prince Vasili Kuragin</span>, <span class=\"green\">Anna Pavlovna</span>, <span class=\"green\">St. Petersburg</span>, <span class=\"red\">If you have nothing better to do, Count [or Prince], and if the\n",
      "prospect of spending an evening with a poor invalid is not too\n",
      "terrible, I shall be very charmed to see you tonight between 7 and 10-\n",
      "Annette Scherer.</span>, <span class=\"red\">Heavens! what a virulent attack!</span>, <span class=\"green\">the prince</span>, <span class=\"green\">Anna Pavlovna</span>, <span class=\"red\">First of all, dear friend, tell me how you are. Set your friend's\n",
      "mind at rest,</span>, <span class=\"red\">Can one be well while suffering morally? Can one be calm in times\n",
      "like these if one has any feeling?</span>, <span class=\"green\">Anna Pavlovna</span>, <span class=\"red\">You are\n",
      "staying the whole evening, I hope?</span>, <span class=\"red\">And the fete at the English ambassador's? Today is Wednesday. I\n",
      "must put in an appearance there,</span>, <span class=\"green\">the prince</span>, <span class=\"red\">My daughter is\n",
      "coming for me to take me there.</span>, <span class=\"red\">I thought today's fete had been canceled. I confess all these\n",
      "festivities and fireworks are becoming wearisome.</span>, <span class=\"red\">If they had known that you wished it, the entertainment would\n",
      "have been put off,</span>, <span class=\"green\">the prince</span>, <span class=\"red\">Don't tease! Well, and what has been decided about Novosiltsev's\n",
      "dispatch? You know everything.</span>, <span class=\"red\">What can one say about it?</span>, <span class=\"green\">the prince</span>, <span class=\"red\">What has been decided? They have decided that\n",
      "Buonaparte has burnt his boats, and I believe that we are ready to\n",
      "burn ours.</span>, <span class=\"green\">Prince Vasili</span>, <span class=\"green\">Anna Pavlovna</span>, <span class=\"green\">Anna Pavlovna</span>, <span class=\"red\">Oh, don't speak to me of Austria. Perhaps I don't understand\n",
      "things, but Austria never has wished, and does not wish, for war.\n",
      "She is betraying us! Russia alone must save Europe. Our gracious\n",
      "sovereign recognizes his high vocation and will be true to it. That is\n",
      "the one thing I have faith in! Our good and wonderful sovereign has to\n",
      "perform the noblest role on earth, and he is so virtuous and noble\n",
      "that God will not forsake him. He will fulfill his vocation and\n",
      "crush the hydra of revolution, which has become more terrible than\n",
      "ever in the person of this murderer and villain! We alone must\n",
      "avenge the blood of the just one.... Whom, I ask you, can we rely\n",
      "on?... England with her commercial spirit will not and cannot\n",
      "understand the Emperor Alexander's loftiness of soul. She has\n",
      "refused to evacuate Malta. She wanted to find, and still seeks, some\n",
      "secret motive in our actions. What answer did Novosiltsev get? None.\n",
      "The English have not understood and cannot understand the\n",
      "self-abnegation of our Emperor who wants nothing for himself, but only\n",
      "desires the good of mankind. And what have they promised? Nothing! And\n",
      "what little they have promised they will not perform! Prussia has\n",
      "always declared that Buonaparte is invincible, and that all Europe\n",
      "is powerless before him.... And I don't believe a word that Hardenburg\n",
      "says, or Haugwitz either. This famous Prussian neutrality is just a\n",
      "trap. I have faith only in God and the lofty destiny of our adored\n",
      "monarch. He will save Europe!</span>, <span class=\"red\">I think,</span>, <span class=\"green\">the prince</span>, <span class=\"red\">that if you had been\n",
      "sent instead of our dear <span class=\"green\">Wintzingerode</span> you would have captured the\n",
      "<span class=\"green\">King of Prussia</span>'s consent by assault. You are so eloquent. Will you\n",
      "give me a cup of tea?</span>, <span class=\"green\">Wintzingerode</span>, <span class=\"green\">King of Prussia</span>, <span class=\"red\">In a moment. A propos,</span>, <span class=\"red\">I am\n",
      "expecting two very interesting men tonight, <span class=\"green\">le Vicomte de Mortemart</span>,\n",
      "who is connected with the <span class=\"green\">Montmorencys</span> through the <span class=\"green\">Rohans</span>, one of\n",
      "the best French families. He is one of the genuine emigres, the good\n",
      "ones. And also the <span class=\"green\">Abbe Morio</span>. Do you know that profound thinker? He\n",
      "has been received by <span class=\"green\">the Emperor</span>. Had you heard?</span>, <span class=\"green\">le Vicomte de Mortemart</span>, <span class=\"green\">Montmorencys</span>, <span class=\"green\">Rohans</span>, <span class=\"green\">Abbe Morio</span>, <span class=\"green\">the Emperor</span>, <span class=\"red\">I shall be delighted to meet them,</span>, <span class=\"green\">the prince</span>, <span class=\"red\">But tell me,</span>, <span class=\"red\">is it true that the Dowager Empress wants Baron Funke\n",
      "to be appointed first secretary at Vienna? The baron by all accounts\n",
      "is a poor creature.</span>, <span class=\"green\">Prince Vasili</span>, <span class=\"green\">Dowager Empress Marya Fedorovna</span>, <span class=\"green\">the baron</span>, <span class=\"green\">Anna Pavlovna</span>, <span class=\"green\">the Empress</span>, <span class=\"red\">Baron Funke has been recommended to the Dowager Empress by her\n",
      "sister,</span>, <span class=\"green\">the Empress</span>, <span class=\"green\">Anna Pavlovna's</span>, <span class=\"green\">Her Majesty</span>, <span class=\"green\">Baron\n",
      "Funke</span>, <span class=\"green\">The prince</span>, <span class=\"green\">Anna\n",
      "Pavlovna</span>, <span class=\"green\">the Empress</span>, <span class=\"red\">Now about your family. Do you know that since your daughter came\n",
      "out everyone has been enraptured by her? They say she is amazingly\n",
      "beautiful.</span>, <span class=\"green\">The prince</span>, <span class=\"red\">I often think,</span>, <span class=\"red\">I often think how unfairly sometimes the\n",
      "joys of life are distributed. Why has fate given you two such splendid\n",
      "children? I don't speak of <span class=\"green\">Anatole</span>, your youngest. I don't like\n",
      "him,</span>, <span class=\"green\">Anatole</span>, <span class=\"red\">Two such charming children. And really you appreciate\n",
      "them less than anyone, and so you don't deserve to have them.</span>, <span class=\"red\">I can't help it,</span>, <span class=\"green\">the prince</span>, <span class=\"red\">Lavater would have said I\n",
      "lack the bump of paternity.</span>, <span class=\"red\">Don't joke; I mean to have a serious talk with you. Do you know I\n",
      "am dissatisfied with your younger son? Between ourselves</span>, <span class=\"red\">he was mentioned at Her\n",
      "Majesty's and you were pitied....</span>, <span class=\"green\">The prince</span>, <span class=\"red\">What would you have me do?</span>, <span class=\"red\">You know I did all\n",
      "a father could for their education, and they have both turned out\n",
      "fools. Hippolyte is at least a quiet fool, but Anatole is an active\n",
      "one. That is the only difference between them.</span>, <span class=\"red\">And why are children born to such men as you? If you were not a\n",
      "father there would be nothing I could reproach you with,</span>, <span class=\"green\">Anna\n",
      "Pavlovna</span>, <span class=\"red\">I am your faithful slave and to you alone I can confess that my\n",
      "children are the bane of my life. It is the cross I have to bear. That\n",
      "is how I explain it to myself. It can't be helped!</span>, <span class=\"green\">Anna Pavlovna</span>]\n"
     ]
    }
   ],
   "source": [
    "allText = bs.find_all('span', {'class':{'green', 'red'}})\n",
    "print([text for text in allText])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7\n"
     ]
    }
   ],
   "source": [
    "nameList = bs.find_all(text='the prince')\n",
    "print(len(nameList))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[]\n"
     ]
    }
   ],
   "source": [
    "title = bs.find_all(id='title', class_='text')\n",
    "print([text for text in allText])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "<tr><th>\n",
      "Item Title\n",
      "</th><th>\n",
      "Description\n",
      "</th><th>\n",
      "Cost\n",
      "</th><th>\n",
      "Image\n",
      "</th></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift1\"><td>\n",
      "Vegetable Basket\n",
      "</td><td>\n",
      "This vegetable basket is the perfect gift for your health conscious (or overweight) friends!\n",
      "<span class=\"excitingNote\">Now with super-colorful bell peppers!</span>\n",
      "</td><td>\n",
      "$15.00\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img1.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift2\"><td>\n",
      "Russian Nesting Dolls\n",
      "</td><td>\n",
      "Hand-painted by trained monkeys, these exquisite dolls are priceless! And by \"priceless,\" we mean \"extremely expensive\"! <span class=\"excitingNote\">8 entire dolls per set! Octuple the presents!</span>\n",
      "</td><td>\n",
      "$10,000.52\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img2.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift3\"><td>\n",
      "Fish Painting\n",
      "</td><td>\n",
      "If something seems fishy about this painting, it's because it's a fish! <span class=\"excitingNote\">Also hand-painted by trained monkeys!</span>\n",
      "</td><td>\n",
      "$10,005.00\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img3.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift4\"><td>\n",
      "Dead Parrot\n",
      "</td><td>\n",
      "This is an ex-parrot! <span class=\"excitingNote\">Or maybe he's only resting?</span>\n",
      "</td><td>\n",
      "$0.50\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img4.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift5\"><td>\n",
      "Mystery Box\n",
      "</td><td>\n",
      "If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class=\"excitingNote\">Keep your friends guessing!</span>\n",
      "</td><td>\n",
      "$1.50\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img6.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from urllib.request import urlopen\n",
    "from bs4 import BeautifulSoup\n",
    "\n",
    "html = urlopen('http://www.pythonscraping.com/pages/page3.html')\n",
    "bs = BeautifulSoup(html, 'html.parser')\n",
    "\n",
    "for child in bs.find('table',{'id':'giftList'}).children:\n",
    "    print(child)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift1\"><td>\n",
      "Vegetable Basket\n",
      "</td><td>\n",
      "This vegetable basket is the perfect gift for your health conscious (or overweight) friends!\n",
      "<span class=\"excitingNote\">Now with super-colorful bell peppers!</span>\n",
      "</td><td>\n",
      "$15.00\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img1.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift2\"><td>\n",
      "Russian Nesting Dolls\n",
      "</td><td>\n",
      "Hand-painted by trained monkeys, these exquisite dolls are priceless! And by \"priceless,\" we mean \"extremely expensive\"! <span class=\"excitingNote\">8 entire dolls per set! Octuple the presents!</span>\n",
      "</td><td>\n",
      "$10,000.52\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img2.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift3\"><td>\n",
      "Fish Painting\n",
      "</td><td>\n",
      "If something seems fishy about this painting, it's because it's a fish! <span class=\"excitingNote\">Also hand-painted by trained monkeys!</span>\n",
      "</td><td>\n",
      "$10,005.00\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img3.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift4\"><td>\n",
      "Dead Parrot\n",
      "</td><td>\n",
      "This is an ex-parrot! <span class=\"excitingNote\">Or maybe he's only resting?</span>\n",
      "</td><td>\n",
      "$0.50\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img4.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n",
      "<tr class=\"gift\" id=\"gift5\"><td>\n",
      "Mystery Box\n",
      "</td><td>\n",
      "If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class=\"excitingNote\">Keep your friends guessing!</span>\n",
      "</td><td>\n",
      "$1.50\n",
      "</td><td>\n",
      "<img src=\"../img/gifts/img6.jpg\"/>\n",
      "</td></tr>\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from urllib.request import urlopen\n",
    "from bs4 import BeautifulSoup\n",
    "\n",
    "html = urlopen('http://www.pythonscraping.com/pages/page3.html')\n",
    "bs = BeautifulSoup(html, 'html.parser')\n",
    "\n",
    "for sibling in bs.find('table', {'id':'giftList'}).tr.next_siblings:\n",
    "    print(sibling) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "$15.00\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from urllib.request import urlopen\n",
    "from bs4 import BeautifulSoup\n",
    "\n",
    "html = urlopen('http://www.pythonscraping.com/pages/page3.html')\n",
    "bs = BeautifulSoup(html, 'html.parser')\n",
    "print(bs.find('img',\n",
    "              {'src':'../img/gifts/img1.jpg'})\n",
    "      .parent.previous_sibling.get_text())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "../img/gifts/img1.jpg\n",
      "../img/gifts/img2.jpg\n",
      "../img/gifts/img3.jpg\n",
      "../img/gifts/img4.jpg\n",
      "../img/gifts/img6.jpg\n"
     ]
    }
   ],
   "source": [
    "from urllib.request import urlopen\n",
    "from bs4 import BeautifulSoup\n",
    "import re\n",
    "\n",
    "html = urlopen('http://www.pythonscraping.com/pages/page3.html')\n",
    "bs = BeautifulSoup(html, 'html.parser')\n",
    "images = bs.find_all('img', {'src':re.compile('\\.\\.\\/img\\/gifts/img.*\\.jpg')})\n",
    "for image in images: \n",
    "    print(image['src'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<img src=\"../img/gifts/logo.jpg\" style=\"float:left;\"/>,\n",
       " <tr class=\"gift\" id=\"gift1\"><td>\n",
       " Vegetable Basket\n",
       " </td><td>\n",
       " This vegetable basket is the perfect gift for your health conscious (or overweight) friends!\n",
       " <span class=\"excitingNote\">Now with super-colorful bell peppers!</span>\n",
       " </td><td>\n",
       " $15.00\n",
       " </td><td>\n",
       " <img src=\"../img/gifts/img1.jpg\"/>\n",
       " </td></tr>,\n",
       " <tr class=\"gift\" id=\"gift2\"><td>\n",
       " Russian Nesting Dolls\n",
       " </td><td>\n",
       " Hand-painted by trained monkeys, these exquisite dolls are priceless! And by \"priceless,\" we mean \"extremely expensive\"! <span class=\"excitingNote\">8 entire dolls per set! Octuple the presents!</span>\n",
       " </td><td>\n",
       " $10,000.52\n",
       " </td><td>\n",
       " <img src=\"../img/gifts/img2.jpg\"/>\n",
       " </td></tr>,\n",
       " <tr class=\"gift\" id=\"gift3\"><td>\n",
       " Fish Painting\n",
       " </td><td>\n",
       " If something seems fishy about this painting, it's because it's a fish! <span class=\"excitingNote\">Also hand-painted by trained monkeys!</span>\n",
       " </td><td>\n",
       " $10,005.00\n",
       " </td><td>\n",
       " <img src=\"../img/gifts/img3.jpg\"/>\n",
       " </td></tr>,\n",
       " <tr class=\"gift\" id=\"gift4\"><td>\n",
       " Dead Parrot\n",
       " </td><td>\n",
       " This is an ex-parrot! <span class=\"excitingNote\">Or maybe he's only resting?</span>\n",
       " </td><td>\n",
       " $0.50\n",
       " </td><td>\n",
       " <img src=\"../img/gifts/img4.jpg\"/>\n",
       " </td></tr>,\n",
       " <tr class=\"gift\" id=\"gift5\"><td>\n",
       " Mystery Box\n",
       " </td><td>\n",
       " If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class=\"excitingNote\">Keep your friends guessing!</span>\n",
       " </td><td>\n",
       " $1.50\n",
       " </td><td>\n",
       " <img src=\"../img/gifts/img6.jpg\"/>\n",
       " </td></tr>]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bs.find_all(lambda tag: len(tag.attrs) == 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<span class=\"excitingNote\">Or maybe he's only resting?</span>]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bs.find_all(lambda tag: tag.get_text() == 'Or maybe he\\'s only resting?')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\"Or maybe he's only resting?\"]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bs.find_all('', text='Or maybe he\\'s only resting?')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
